Enabling robots to autonomously navigate complex environments is essentialfor real-world deployment. Prior methods approach this problem by having therobot maintain an internal map of the world, and then use a localization andplanning method to navigate through the internal map. However, these approachesoften include a variety of assumptions, are computationally intensive, and donot learn from failures. In contrast, learning-based methods improve as therobot acts in the environment, but are difficult to deploy in the real-worlddue to their high sample complexity. To address the need to learn complexpolicies with few samples, we propose a generalized computation graph thatsubsumes value-based model-free methods and model-based methods, with specificinstantiations interpolating between model-free and model-based. We theninstantiate this graph to form a navigation model that learns from raw imagesand is sample efficient. Our simulated car experiments explore the designdecisions of our navigation model, and show our approach outperformssingle-step and $N$-step double Q-learning. We also evaluate our approach on areal-world RC car and show it can learn to navigate through a complex indoorenvironment with a few hours of fully autonomous, self-supervised training.Videos of the experiments and code can be found at github.com/gkahn13/gcg
展开▼
机译:使机器人能够自主导航复杂的环境对于实际部署至关重要。现有的方法通过使机器人维护世界的内部地图来解决此问题,然后使用定位和计划方法在内部地图中导航。但是,这些方法通常包括各种假设,计算量大,并且不能从失败中学习。相比之下,基于学习的方法会随着机器人在环境中的行为而提高,但由于其样本复杂性高而难以在现实世界中部署。为了解决使用少量样本学习复杂策略的需求,我们提出了一个通用计算图,该图包含基于值的无模型方法和基于模型的方法,并在无模型和基于模型之间进行插值。然后我们实例化该图以形成一个导航模型,该模型可以从原始图像中学习,并且采样效率高。我们的模拟汽车实验探索了导航模型的设计决策,并显示了我们的方法优于单步和$ N $步的双Q学习。我们还评估了我们在区域世界RC汽车上的方法,并表明它可以通过几个小时的完全自主,自我监督的训练来学习如何在复杂的室内环境中导航。实验视频和代码可在github.com/gkahn13上找到/ gcg
展开▼